full transcript

From the Ted Talk by Kenneth Cukier: Big data is better data

Unscramble the Blue Letters

Machine learning is at the basis of many of the things that we do online: seacrh engines, Amazon's personalization algorithm, computer taoarnsitln, voice recognition systems. Researchers recently have looked at the qoeuitsn of biopsies, cancerous biipseos, and they've asked the computer to identify by looking at the data and survival rates to determine whether cells are actually cancerous or not, and sure enough, when you throw the data at it, through a machine-learning algorithm, the machine was able to idenitfy the 12 telltale signs that best predict that this biopsy of the breast cancer cells are indeed cancerous. The problem: The medical literature only knew nine of them. Three of the ttaris were ones that pploee didn't need to look for, but that the machine spotted.

Open Cloze

Machine learning is at the basis of many of the things that we do online: ______ engines, Amazon's personalization algorithm, computer ___________, voice recognition systems. Researchers recently have looked at the ________ of biopsies, cancerous ________, and they've asked the computer to identify by looking at the data and survival rates to determine whether cells are actually cancerous or not, and sure enough, when you throw the data at it, through a machine-learning algorithm, the machine was able to ________ the 12 telltale signs that best predict that this biopsy of the breast cancer cells are indeed cancerous. The problem: The medical literature only knew nine of them. Three of the ______ were ones that ______ didn't need to look for, but that the machine spotted.

Solution

  1. question
  2. traits
  3. people
  4. search
  5. biopsies
  6. identify
  7. translation

Original Text

Machine learning is at the basis of many of the things that we do online: search engines, Amazon's personalization algorithm, computer translation, voice recognition systems. Researchers recently have looked at the question of biopsies, cancerous biopsies, and they've asked the computer to identify by looking at the data and survival rates to determine whether cells are actually cancerous or not, and sure enough, when you throw the data at it, through a machine-learning algorithm, the machine was able to identify the 12 telltale signs that best predict that this biopsy of the breast cancer cells are indeed cancerous. The problem: The medical literature only knew nine of them. Three of the traits were ones that people didn't need to look for, but that the machine spotted.

Frequently Occurring Word Combinations

ngrams of length 2

collocation frequency
big data 14
arthur samuel 6
machine learning 4
favorite pie 2
supermarket sales 2
smaller amounts 2
term big 2
small data 2
national security 2
security agency 2
martin luther 2
telltale signs 2
samuel knew 2

ngrams of length 3

collocation frequency
term big data 2
national security agency 2
arthur samuel knew 2

Important Words

  1. algorithm
  2. asked
  3. basis
  4. biopsies
  5. biopsy
  6. breast
  7. cancer
  8. cancerous
  9. cells
  10. computer
  11. data
  12. determine
  13. engines
  14. identify
  15. knew
  16. learning
  17. literature
  18. looked
  19. machine
  20. medical
  21. people
  22. personalization
  23. predict
  24. question
  25. rates
  26. recognition
  27. researchers
  28. search
  29. signs
  30. spotted
  31. survival
  32. systems
  33. telltale
  34. throw
  35. traits
  36. translation
  37. voice